Genetic Algorithms for Selection and Partitioning of Attributes in Large-Scale Data Mining Problems
نویسندگان
چکیده
This paper proposes and surveys genetic implementations of algorithms for selection and partitioning of attributes in large-scale concept learning problems. Algorithms of this type apply relevance determination criteria to attributes from those specified for the original data set. The selected attributes are used to define new data clusters that are used as intermediate training targets. The purpose of this change of representation step is to improve the accuracy of supervised learning using the reformulated data. Domain knowledge about these operators has been shown to reduce the number of fitness evaluations for candidate attributes. This paper examines the genetic encoding of attribute selection and partitioning specifications, and the encoding of domain knowledge about operators in a fitness function. The purpose of this approach is to improve upon existing search-based algorithms (or wrappers) in terms of training sample efficiency. Several GA implementations of alternative (search-based and knowledge-based) attribute synthesis algorithms are surveyed, and their application to large-scale concept learning problems is addressed.
منابع مشابه
Genetic Algorithms for Reformulation of Large-Scale KDD Problems with Many Irrelevant Attributes
The goal of this research is to apply genetic implementations of algorithms for selection, partitioning, and synthesis of attributes in largescale data mining problems. Domain knowledge about these operators has been shown to reduce the number of fitness evaluations for candidate attributes. We report results on genetic optimization of attribute selection problems and current work on attribute ...
متن کاملUsing Data Mining and Three Decision Tree Algorithms to Optimize the Repair and Maintenance Process
The purpose of this research is to predict the failure of devices using a data mining tool. For this purpose, at the outset, an appropriate database consists of 392 records of ongoing failures in a pharmaceutical company in 1394, in the next step, by analyzing 9 characteristics and type of failure as a database class, analyzes have been used. In this regard, three decision tree algorithms have ...
متن کاملA Comparative Study between a Pseudo-Forward Equation (PFE) and Intelligence Methods for the Characterization of the North Sea Reservoir
This paper presents a comparative study between three versions of adaptive neuro-fuzzy inference system (ANFIS) algorithms and a pseudo-forward equation (PFE) to characterize the North Sea reservoir (F3 block) based on seismic data. According to the statistical studies, four attributes (energy, envelope, spectral decomposition and similarity) are known to be useful as fundamental attributes in ...
متن کاملA Comprehensive Study of Several Meta-Heuristic Algorithms for Open-Pit Mine Production Scheduling Problem Considering Grade Uncertainty
It is significant to discover a global optimization in the problems dealing with large dimensional scales to increase the quality of decision-making in the mining operation. It has been broadly confirmed that the long-term production scheduling (LTPS) problem performs a main role in mining projects to develop the performance regarding the obtainability of constraints, while maximizing the whole...
متن کاملSolving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm
In this paper, we study the re-entrant no-wait flexible flowshop scheduling problem with makespan minimization objective and then consider two parallel machines for each stage. The main characteristic of a re-entrant environment is that at least one job is likely to visit certain stages more than once during the process. The no-wait property describes a situation in which every job has its own ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002